Skip to content

fix(sandbox): make interactive connect resilient on stopped/resumed sandboxes#215

Merged
marc-vercel merged 3 commits into
mainfrom
marc-vercel/fix-interactive-connect-resume
Jun 2, 2026
Merged

fix(sandbox): make interactive connect resilient on stopped/resumed sandboxes#215
marc-vercel merged 3 commits into
mainfrom
marc-vercel/fix-interactive-connect-resume

Conversation

@marc-vercel
Copy link
Copy Markdown
Collaborator

Problem

sandbox connect (and interactive sandbox exec) could hang indefinitely on Waiting for connection..., or fail in a confusing way, when run against a stopped sandbox that has to be resumed. It worked reliably against an already-running sandbox, which is why it only showed up intermittently after a stop/resume.

Several independent issues combined to produce this:

  1. Real connection errors were swallowed. Once the connection handshake landed, the abort signal that stops the "did the command exit early?" check was also used to filter errors from attach(). So any failure that happened after the handshake (for example, the resumed session not yet exposing a route for the interactive port) was silently discarded instead of surfaced.

  2. The spinner kept the process alive. The progress spinner's teardown called ora.clear(), which only erases the current frame but leaves its render interval running. That timer keeps Node's event loop alive, so on any early teardown the CLI would sit forever on the spinner instead of exiting.

  3. Early server exits were opaque. When the in-sandbox interactive server exited before connecting, the CLI showed a generic "may have timed out" hint with no detail.

  4. The in-sandbox server trusted a stale config. pty-tunnel-server decided whether a server was already running purely from a leftover config file and a liveness check on its recorded PID. Across a snapshot/resume that config is restored from the snapshot while the original process is gone, so a coincidentally-reused PID made it connect to a dead socket and exit.

Solution

  • Stop funneling attach() through the connection-established abort filter, so genuine connection failures propagate instead of being swallowed.
  • Always stop() the spinner on teardown (not just clear()), so a failure before the connection is established can no longer hang the process.
  • Include the in-sandbox server's stderr in the error when it exits before connecting, so the real cause is visible.
  • Have pty-tunnel-server health-check a server before reusing it, and remove any leftover config before spawning a new one, so a stale config restored from a snapshot can no longer cause a connection to a dead socket.

Together these turn the previous silent hang into either a working connection or a fast, legible error.

🤖 Generated with Claude Code

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sandbox Ready Ready Preview, Comment, Open in v0 Jun 2, 2026 6:41pm
sandbox-cli Ready Ready Preview, Comment, Open in v0 Jun 2, 2026 6:41pm
sandbox-sdk Ready Ready Preview, Comment, Open in v0 Jun 2, 2026 6:41pm
sandbox-sdk-ai-example Ready Ready Preview, Comment, Open in v0 Jun 2, 2026 6:41pm
workflow-code-runner Ready Ready Preview, Comment, Open in v0 Jun 2, 2026 6:41pm

marc-vercel and others added 2 commits June 2, 2026 17:17
…andboxes

`sandbox connect` could hang on "Waiting for connection..." or fail when run
against a stopped/resumed sandbox. Three independent issues:

- The CLI swallowed real `attach()` failures: once the connection handshake
  landed, the same abort signal used to stop the premature-exit check also
  discarded any later `attach()` error, so failures were never surfaced.
- The spinner's disposer called `ora.clear()` instead of `stop()`, leaving the
  render interval running and keeping the event loop (and the CLI) alive
  indefinitely on teardown.
- When the interactive server exited early, the generic error hid the actual
  cause; we now include the server's stderr.
- The in-sandbox server (pty-tunnel-server) trusted a leftover
  /tmp/vercel/interactive/config.json restored from a snapshot whenever its
  recorded PID happened to be alive, connecting to a dead socket. It now
  health-checks a reused server and removes the stale config before spawning a
  fresh one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread .changeset/fix-interactive-connect-resume.md
Comment thread packages/vercel-sandbox/src/sandbox.ts Outdated
@marc-vercel marc-vercel merged commit b4455af into main Jun 2, 2026
14 checks passed
@marc-vercel marc-vercel deleted the marc-vercel/fix-interactive-connect-resume branch June 2, 2026 19:20
@github-actions github-actions Bot mentioned this pull request Jun 2, 2026
marc-vercel added a commit that referenced this pull request Jun 2, 2026
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## sandbox@3.1.1

### Patch Changes

- Fix `sandbox connect` hanging or failing on a stopped/resumed sandbox.
The interactive shell now surfaces `attach()` failures instead of
swallowing them once the connection handshake lands, always stops the
spinner on teardown (so a failure can no longer hang the process), and
includes the in-sandbox server's stderr when the interactive server
exits early. The in-sandbox `vc-interactive-server` also health-checks a
reused server before trusting a leftover config file, so a stale
`/tmp/vercel/interactive/config.json` restored from a snapshot no longer
causes it to connect to a dead socket.
([#215](#215))

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Codina <marc.codina@vercel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants